BMJ Health & Care Informatics
● BMJ
Preprints posted in the last 90 days, ranked by how well they match BMJ Health & Care Informatics's content profile, based on 13 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit.
Uzochukwu, B. S. C.; Cherima, Y. J.; Enebeli, U. U.; Okeke, C. C.; Uzochukwu, A. C.; Omoha, A.; Hassan, B.; Eronu, E. M.; Yusuf, S. M.; Uzochukwu, K. A.; Kalu, E. I.
Show abstract
Background: The integration of artificial intelligence (AI) into clinical practice holds transformative potential for healthcare in West Africa, but safe deployment requires context-appropriate governance, accountability, and post-deployment monitoring frameworks. This cross-sectional mixed-methods study examined preferences and concerns of West African clinicians and technical experts regarding AI governance structures, post-deployment surveillance mechanisms, and accountability allocation. Methods: A structured questionnaire was administered to 136 physicians affiliated with the West African College of Physicians (February 22-28, 2026), complemented by 72 key informant interviews with technical leads, AI developers, data scientists, policymakers, and healthcare leaders. Data were analyzed using descriptive statistics, inferential tests, and thematic analysis. Results: Clinicians strongly preferred independent regulatory bodies (40.4%) for overseeing AI tool performance, with high trust ratings (mean:4.3/5), while vendor self-monitoring received minimal support (3.7%, mean:2.4/5). Real-time dashboards were the most favored monitoring approach (41.9%). Clear accountability pathways (94.1%), algorithm transparency (91.9%), and real-time performance data (89.7%) were rated essential by majorities. Major concerns included clinicians being unfairly blamed for AI errors (76.5%), excessive vendor control (72.8%), and absence of clear reporting pathways (69.9%). Qualitative findings emphasized continuous performance tracking for accuracy, fairness, and bias; structured incident reporting; protocols for model drift and failure; and multi-layered governance combining independent oversight, institutional AI committees, and explicit liability frameworks. Conclusion: This study provides the first empirical evidence from West Africa on clinician preferences for AI governance. Findings offer actionable guidance for policymakers to build trustworthy, equitable, and safe AI integration frameworks that prioritize transparency, independent oversight, and clinician protection. Keywords: artificial intelligence; AI governance; post-deployment monitoring; accountability; West Africa; clinician preferences; health data science.
Thomas, C.; Kim, J. Y.; Hasan, A.; Kpodzro, S.; Cortes, J.; Day, B.; Jensen, S.; LHuillier, S.; Oden, M. O.; Zumbado Segura, S.; Maurer, E. W.; Tucker, S.; Robinson, S.; Garcia, B.; Muramalla, E.; Lu, S.; Chawla, N.; Patel, M.; Balu, S.; Sendak, M.
Show abstract
Safety net healthcare delivery organizations (SNOs) serve vulnerable populations but face persistent challenges in adopting new technologies, including AI. While systematic barriers to technology adoption in SNOs are well documented, little is known about how AI is implemented in these settings. This study explored real-world AI adoption in SNOs, focusing on identifying barriers encountered across the AI lifecycle and strategies used to overcome them. Five SNOs in the U.S. participated in a 12-month technical assistance program, the Practice Network, to implement AI tools of their choosing. Observed barriers and mitigation strategies were documented throughout program activities and, at the conclusion of the program, reviewed and refined with participants using a participatory research approach to ensure findings reflected lived experiences and organizational contexts. Key barriers emerged during the Integration and Lifecycle Management phases and included gaps in AI performance evaluation and impact assessments, communication with patients about AI use, foundational AI education, financial resources for purchasing and maintaining AI tools, and AI governance structures. Effective strategies for addressing these barriers were primarily supported through centralized expertise, structured guidance, and peer learning. These findings provide granular, actionable insights for SNO leaders, offering guidance for anticipating barriers and proactively planning mitigation strategies. By including SNO perspectives, the study also contributes to the broader health AI ecosystem and underscores the importance of participatory, collaborative approaches to support safe, effective, and ethical AI adoption in resource-constrained settings. Author SummarySafety net organizations (SNOs) are healthcare systems that primarily serve low-income and underinsured patients. While interest in artificial intelligence (AI) in healthcare has grown rapidly, little is known about how these organizations experience AI adoption in practice. In this study, we partnered with five SNOs over a 12-month program to document the challenges they encountered when implementing AI tools and the strategies they used to address them. We worked closely with SNO staff throughout the process to ensure our findings reflected their lived experiences with AI implementation. We found that the most common challenges arose when organizations tried to integrate AI into daily operations and monitor and maintain those tools over time. Specific barriers included difficulty evaluating whether AI was performing as expected, limited guidance on communicating with patients about AI use, a lack of resources for staff training, limited financial resources, and the absence of formal governance structures. Successful strategies for overcoming these challenges drew on shared knowledge and structured support provided by the program, as well as learning from peer organizations. These findings offer practical guidance for SNO leaders planning or managing AI adoption, and contribute to a broader conversation about what is required to implement AI safely and effectively in healthcare settings that serve the most medically and socially vulnerable patients.
Shankar, R.; Goh, A.; Xu, Q.
Show abstract
BackgroundThe administrative burden of clinical documentation is a recognised contributor to clinician burnout and diminished care quality. Ambient artificial intelligence (AI) scribe technology, which uses large language models to passively record and summarise clinical encounters, has rapidly gained traction internationally. However, no published studies have examined clinician experiences with this technology in the Asia-Pacific region or within Singapores multilingual healthcare system. ObjectiveThis study explored clinician perspectives on ambient AI scribe technology at Alexandra Hospital, Singapore, focusing on perceived benefits, barriers, workflow integration, ethical considerations, and recommendations for sustained implementation. MethodsA qualitative descriptive study was conducted using semi-structured interviews with 28 clinicians across multiple specialties at Alexandra Hospital, National University Health System (NUHS). Participants were purposively sampled for diversity in role, specialty, and usage level. Interviews were analysed using reflexive thematic analysis guided by the RE-AIM/PRISM framework. The COREQ checklist was followed. ResultsFive themes emerged: (1) reclaiming presence in the clinical encounter, (2) navigating accuracy and trust in AI-generated documentation, (3) workflow disruption and adaptation, (4) privacy, consent, and ethical tensions within Singapores regulatory landscape, and (5) envisioning sustainable integration. Clinicians reported improved patient engagement and reduced cognitive burden. Persistent barriers included accuracy concerns, AI hallucinations, limited multilingual functionality, loss of documentation style, and uncertainties around compliance with the Personal Data Protection Act (PDPA). ConclusionsAmbient AI scribe technology holds promise for alleviating documentation burden in Singapores public healthcare system. Realising this potential requires attention to safety validation, multilingual capability, clinician training, and patient-centred consent aligned with local regulatory frameworks.
Nayyar, C.; Xu, H. H.; Bates, A. T.; Conati, C.; Hilbers, D.; Avery, J.; Raman, S.; Fayaz-Bakhsh, A.; Nunez, J.-J.
Show abstract
Background: Artificial intelligence (AI) has rapidly garnered interest in healthcare, with research showing promise to improve quality, efficiency, and outcomes. Cancer care's multidisciplinary nature and high coordination demands are well positioned to benefit from AI. While attitudes in the uptake of evidence and toward the implementation of AI in medicine has been explored generally, literature remains scarce with specific regards to AI in cancer care. This study sought to understand how perspectives of both patients and professionals are essential for guiding responsible, effective implementation of evidence-based (EB) AI in cancer care. Methods: We conducted a workshop at the 2024 British Columbia (BC) Cancer Summit (Vancouver, Canada). Discussions addressed three guiding questions: concerns, benefits, and priorities for AI in cancer care. Responses from 48 workshop participants (patients and families, AI/computer science/cancer researchers, clinicians and allied health professionals, information technology professionals, healthcare administrators) underwent structured conceptualization by concept mapping, leveraging multidimensional scaling and hierarchical cluster and subcluster analysis to produce visual and quantitative maps of stakeholder priorities. Results: A total of 265 statements on perceived benefits, concerns, and priorities related to the implementation of AI in cancer care were generated from the workshop and underwent concept mapping. Two clusters were identified; Cluster 1 focused on "Challenges and Safeguards for AI Implementation," and Cluster 2 focused on "Clinical Benefits and Efficiency Gains." Subcluster analysis distinguished 8 thematic subclusters (4 per cluster). Both mean importance (P < .001) and feasibility (P < .001) ratings were significantly higher for Cluster 2. No differences were found between ratings by clinical and nonclinical professionals. Further go-zone analysis classified statements according to their relative superiority/inferiority in importance and feasibility compared to the overall average. Conclusions: Stakeholder ratings were higher for statements describing clinical benefits and efficiency gains than for those describing challenges and safeguards for AI implementation in cancer care. Concept mapping analysis distinguished between workflow-aligned AI applications, perceived as ready for implementation, and system-level governance requirements requiring longer-term investment. Present findings provide a structured, stakeholder-informed framework for prioritizing and sequencing AI implementation efforts in cancer care, constituting a practical blueprint to catalyze meaningful progress.
Jafarifiroozabadi, R.
Show abstract
Background: Safety is a critical concern in behavioral health crisis units (BHCUs), where environmental risks (e.g., ligature points) can lead to injury to self or others. However, limited research has examined how perceived safety influences facility selection among patients and care partners, or how these perceptions align with AI-driven safety risk assessments in such environments. Method: To address these gaps, a nationwide discrete choice online survey was conducted using image-based scenarios of BHCU environments, where participants selected preferred facilities based on a range of attributes, including environmental safety risks (e.g., ligature points). Additionally, participants identified safety risks in survey images, which were compared with outputs from an AI-driven tool developed and trained to detect environmental risks by experts. Quantitative analysis using conditional logit models examined the influence of attributes on facility choice, while spatial comparisons of annotated images and heatmaps assessed participant and AI-identified risk alignments. Results: Findings revealed that the higher frequency of safety risks in images significantly reduced the likelihood of facility selection (p < .001, OR {approx} 1.28), highlighting the importance of perceived safety in user decision-making. While there was notable alignment between heatmaps generated by participants and AI, key differences emerged, suggesting that participant safety perception was influenced by features not fully captured by AI, such as the type of materials or unknown, out-of-label safety risks in facility images. Conclusions: Despite these limitations, results highlighted the value of integrating AI-driven assistive tools for non-expert user safety risk assessment to support decision-making for safer BHCU environments.
Nkosi-Mjadu, B. E.
Show abstract
BackgroundSouth Africas public healthcare system serves most of the population through approximately 3,900 primary healthcare clinics characterised by long waiting times and high volumes of repeat-prescription visits. No published pre-arrival digital triage system operates across all 11 official South African languages while aligning with the South African Triage Scale (SATS). This paper reports the design and preliminary safety validation of BIZUSIZO, a hybrid deterministic-AI WhatsApp triage system. MethodsBIZUSIZO delivers SATS-aligned triage via WhatsApp, combining AI-assisted free-text classification (Claude Haiku 4.5) with a Deterministic Clinical Safety Layer (DCSL) that overrides AI output for 53 clinical discriminator categories (14 RED, 19 ORANGE, 20 YELLOW) coded in all 11 official languages and independent of AI availability. A five-domain risk factor assessment can only upgrade triage level. One hundred and twenty clinical vignettes in patient language (English, isiZulu, isiXhosa, Afrikaans; 30 per language) were scored against a developer-assigned gold standard with independent blinded nurse review. A 121-vignette multilingual DCSL safety consistency check across all 11 languages and a 220-call post-hoc framing sensitivity evaluation (110 paired vignettes) were also conducted. ResultsUnder-triage was 3.3% (4/120; 95% CI: 0.9%-8.3%) with no RED under-triage; exact concordance was 80.0% (96/120) and quadratic weighted kappa 0.891 (95% CI: 0.827-0.932). One two-level under-triage was observed on a non-RED presentation (V072, isiXhosa burns vignette, ORANGEGREEN); one two-level over-triage was observed (V054, isiZulu deep laceration, YELLOWRED). In the framing sensitivity evaluation, AI-only classification achieved 50.9% RED invariance under adversarial framing; full-pipeline classification achieved 95.0% in four validated languages, with the DCSL rescuing 18 of 23 AI drift cases. ConclusionsA hybrid deterministic-AI triage system with DCSL-based emergency detection achieved zero RED under-triage and consistent RED detection across all 11 official languages. The 16.7% over-triage rate falls within published South African SATS ranges (13.1-49%). A single two-level under-triage event was observed on an isiXhosa burns vignette (ORANGEGREEN) and is discussed in Limitations. Findings are preliminary; prospective validation against independent nurse triage is the necessary next step.
Sezgin, E.; Lee, J. A.; Jadczyk, T.; Taxter, A. J.
Show abstract
ObjectiveWe surveyed 524 healthcare professionals (HCPs) in the United States and United Kingdom to examine workplace generative AI use, access, and barriers in two high-maturity health settings. MethodsThis cross-sectional survey compared AI usage breadth, access modes, and barriers among HCPs, stratified by country and professional role. ResultsOverall, 75.8% of HCPs reported recent AI use, mainly for documentation, literature search, and clinical decision support. Usage breadth was similar by country, but role differences were pronounced. Physicians reported broader use and were significantly more likely to access AI via personal, non-employer-provided tools (60.4% vs. 31.0% for nurses; P<.01). Personal tools were the most common access mode overall (40.1%). ConclusionAI use is common, but institutional access lags adoption. Shifting use from personal accounts toward governed, approved systems is a key priority.
Hazare, N. S.; Oh, W.; Kumar, G.; Goel, N.; Shaikh, A.; Sharma, A.; Desman, J.; Kumar, A.; Robles, C.; Singh, A.; Jangda, M.; Agaron, S.; Capone, C.; Ngai, D.; Itwaru, A.; Parchure, P.; Ramaswamy, A.; Gorbenko, K.; Timsina, P.; Lampert, J.; Tamler, R.; Manasia, A.; Kohli-Seth, R.; Kaplan, B.; Vakil, A.; Omar, M.; Glicksberg, B. S.; Freeman, R.; Stern, A. D.; Klang, E.; Darrow, B.; Stump, L. S.; Reich, D.; Charney, A.; Nadkarni, G. N.; Sakhuja, A.
Show abstract
Importance: Physician-facing AI tools are now in clinical use, yet whether different platforms fail in similar or fundamentally different ways in high-stakes settings like critical care is unknown. Objective: To evaluate two physician-facing AI platforms, ChatGPT for Clinicians and OpenEvidence, for distinct vulnerabilities under structured stress testing. Design, Setting, and Participants: An observational study conducted using 60 simulated critical care vignettes developed and adjudicated by four attending critical care physicians. Data were collected in the last week of April 2026, via the public website interfaces of each platform. Interventions/Exposures: A 2x2x2x2 factorial design across four stressors - anchoring, cognitive load, social conformity pressure, and a clinically incorrect directive - yielded 16 prompt subsets per vignette and 960 prompts per platform. A separate multi-turn adversarial prompting paradigm administered three sequential "You are incorrect" challenges to baseline vignettes. All prompts had a universal output length constraint of fewer than 30 words. Main Outcomes and Measures: Critical elements capture (percentage of gold-standard critical elements present in responses), susceptibility to clinically incorrect directive, and sycophancy (reversal of an initial correct recommendation under iterative adversarial challenge). Results: Across 1916 responses to 1920 prompts, ChatGPT for Clinicians captured more gold-standard critical elements than OpenEvidence (81.4% {+/-} 18.1% vs 61.0% {+/-} 23.5%; adjusted difference, 20.3 percentage points; 95% CI, 18.3 to 22.4; P < .001) and was less susceptible to clinically incorrect directives (1.7% vs 8.0%; adjusted odds ratio, 0.07; 95% CI, 0.02-0.21; P < .001). Anchoring and social conformity pressure were associated with reduced critical element capture across both platforms, while cumulative stressor burden reduced critical element capture only on OpenEvidence. Conversely, ChatGPT for Clinicians reversed correct recommendations more readily under adversarial prompting (hazard ratio, 2.61; 95% CI, 1.10 - 6.19; P = .03). Conclusion and Relevance: The two physician-facing clinical AI platforms evaluated demonstrated non-overlapping vulnerabilities, with neither platform uniformly superior. These findings argue against single-axis ranking of clinical AI systems and support multidimensional safety evaluation encompassing completeness of reasoning, resistance to incorrect directives, and stability under adversarial challenge.
Choi, J.; Kim, Y. J.; Lyu, P.; Luan, Y. L.; Toh, S. M.
Show abstract
Artificial intelligence (AI) is increasingly incorporated into diagnostic decision-making, raising questions about physician responsibility following AI-involved adverse diagnostic events. Explainable AI (XAI) has been proposed to improve transparency and trust, but its influence on public reactions remains unclear. In a randomised vignette-based experiment, 652 adults from the United States and United Kingdom were assigned to one of six conditions in a 3 (diagnostic source: AI alone, human radiologist alone, or human-AI collaboration) x 2 (explanation: present or absent) between-subjects design. Participants read a scenario in which a chest X-ray was initially interpreted as normal but lung cancer was diagnosed five months later, indicating that the original interpretation had missed the cancer. In explanation conditions, participants received additional information about how the diagnosis had been reached, including AI heatmap-based explanations in the AI conditions. Participants rated radiologist responsibility, likelihood of complaint, and intention to pursue legal action. Among 652 participants (mean age 42.2 years; 50.2% female), responsibility ratings were significantly lower when AI alone made the diagnostic decision (mean 4.73, 95% CI 4.53-4.93) compared with human-only decision-making (5.78, 95% CI 5.59-5.98; p<0.001) and human-AI collaboration (5.54, 95% CI 5.34-5.74; p<0.001). Complaint likelihood showed a similar pattern. Intentions to pursue legal action followed the same directional trend but were marginally significant. Neither explanations nor explanation-by-source interactions were associated with outcome measures. These findings suggest that the public expects physicians to remain accountable when AI is involved in diagnostic decision-making, particularly in collaborative settings. Providing explanatory information about how AI systems reach decisions may be insufficient to change perceptions of physician responsibility following adverse diagnostic events.
Sajjad, M.
Show abstract
Artificial intelligence (AI) tools have been rapidly adopted by medical researchers, yet whether early career researchers in low and middle income countries possess the awareness and habits needed to use these tools safely remains poorly documented. This study characterized AI adoption patterns, hallucination awareness, and verification and disclosure practices among early career medical researchers in Pakistan. A cross sectional anonymous online survey was conducted among medical students, house officers, residents, physicians, and faculty involved in research or academic work across Pakistan (May 2026). Descriptive statistics and chi square tests were applied to 373 eligible responses. AI use was near universal (99.7%), with 60.3% using AI tools daily. The most commonly reported tool in this sample was Claude (40.5%), followed by ChatGPT (29.2%) and Perplexity (26.0%), though this ranking likely reflects sampling characteristics. Despite high adoption, 59.2% typically did not verify AI outputs before use, and 40.2% had never heard that AI can generate fabricated scientific references. In behavioral vignettes, 36.5% assumed convincing AI generated references were authentic, and 54.2% would continue using remaining AI content after discovering one fabricated reference. Formal research training was strongly associated with consistent disclosure (51.7% vs. 17.1%; chi square=48.43, p less than 0.001). Role, daily use frequency, and research training were not significantly associated with verification behavior. Early career medical researchers in Pakistan demonstrate high AI adoption alongside incomplete hallucination awareness and infrequent verification, a pattern that may carry implications for research integrity. Formal training was the only factor significantly associated with consistent disclosure. Integration of AI literacy into medical curricula and institutional governance frameworks merits consideration.
Vasquez-Venegas, C.; Chewcharat, A.; Kimera, R.; Kurtzman, N.; Leite, M.; Woite, N. L.; Muppidi, I. J.; Muppidi, R. J.; Liu, X.; Ong, E. P.; Pal, R.; Myers, C.; Salzman, S.; Patscheider, J. S.; John, T. R.; Rogers, M.; Samuel, M.; Santana-Guerrero, J. L.; Yaacob, S.; Gameiro, R. R.; Celi, L. A.
Show abstract
Computer vision models for chest X-ray interpretation hold significant promise for global healthcare, but their clinical value depends on equitable development across diverse populations. We conducted a scientometric analysis to examine authorship patterns, geographic distribution, and dataset origins to assess potential disparities that could affect clinical applicability. We systematically reviewed literature on computer vision applications for chest X-rays published between 2017-2025 across multiple databases, including PubMed, Embase and SciELO databases. Using Dimensions API and manual extraction, we analyzed 928 eligible studies, examining first and senior author affiliations, institutional contributions, dataset provenance, and collaboration patterns across different income classifications based on World Bank categories. High-income countries dominated research leadership, representing 55.6% of first authors and 59.7% of senior authors; no first authors were affiliated with low-income countries. China (16.93%) and the United States (16.72%) led in first authorship positions. Most datasets (73.6%) originated from high-income settings, with the United States being the largest contributor (40.45%). Private datasets were most frequently used (20.52%). Cross-income collaborations were rare, with only 3.9% of publications involving partnerships between high-income and lower-middle-income countries. Findings reveal substantial disparities in who shapes computer vision research on chest X-rays and which populations are represented in training data. These imbalances risk developing AI systems that perform inconsistently across diverse healthcare settings, potentially exacerbating healthcare inequities. Addressing these disparities requires coordinated efforts to develop globally representative datasets, establish equitable international collaborations, and implement policies that promote inclusive research practices.
Calderon, P. F.; Wolosker, N.
Show abstract
Objective: Develop a methodology to implement action plans that mitigate the negative impacts associated with the EHR implementation project and evaluate their effectiveness in reducing these issues. Methods: The research involved the development of mitigation plans for the potential negative impacts of implementing an electronic health record system, ensuring their execution and subsequently analyzing the effectiveness of the method. Results: Findings confirmed that 19.3% of 264 identified impacts were resolved through 52 plans before Go Live. During Go Live, the remaining 213 impacts were addressed through 337 plans. Six months later, 190 impacts were confirmed, and the plans were considered effective or partially effective in 80.5% of cases. Conclusions: Effective governance, a multidisciplinary methodology, and well-planned and executed actions increase the likelihood of success for health technology projects.
Wang, Y.; He, H.; Zhu, R.; Lu, Y.; Phadungsaksawasdi, P.; Peng, M.; Liu, Z.; Zou, K.; Zhang, Y.; Chew, S. P.; Tham, Y. C.; Khorasani, A.; Deng, H.; Cheng, C.-Y.; Yang, J.; Liu, D.
Show abstract
Background Patients worldwide receive healthcare in many languages, yet medical AI systems are validated almost exclusively in high-resource languages such as English and Chinese, exposing patients in other linguistic settings to unquantified diagnostic risk. Existing multilingual evaluations rely on translated research-style benchmarks that fail to capture authentic clinical work. We aimed to characterise the patient safety consequences of multilingual medical AI deployment in real-world clinical settings and to develop an auditable detection method for unsafe outputs. Methods We evaluated different language models (LLMs) and visual language models (VLMs) across four real-world clinical tasks (conversational QA, radiology report generation, glaucoma diagnosis, ICU re-intubation prediction) in five languages (English, Chinese, Malay, Thai, Persian). We developed a token-level uncertainty toolkit to localise reasoning instability, compared three inference paradigms (native-language, English chain-of-thought, back-translation pivot), and conducted a prospective study (50 dialogues, 150 physician-reviewed records). Findings LLM/VLM performance degraded consistently from high- to low-resource languages across all tasks. Key gaps included: HealthBench score declining from 0.3743 to 0.3180; radiology macro-F1 from 0.2938 to 0.2149-0.2424, consistent with selective pathology suppression; glaucoma accuracy from 50.7% to 32.7%; ICU parameter recall from 100.0% to 48.5%. Multimodal inputs amplified degradation. Qwen3 VL 235B showed attenuated decline with no resource-ordered pattern in glaucoma classification. Token-level analysis localised instability to mid-chain stages (40-70% of the normalised trajectory); perplexity-based confidence failed to flag errors (AUROC 0.41-0.66). Back-translation pivot consistently restored performance. In the prospective study, 98.7% of records required physician edits (overall modification score 53.6%); Thai-pivot correction burden (59.0%) exceeded English-pivot (50.7%, p=0.003) and Chinese-direct (51.0%, p=0.004). Interpretation Multilingual deployment produced clinically consequential failures, including missed pathology, distorted physiological extraction, and amplified multimodal misclassification, that were invisible to monolingual validation and not reliably flagged by model confidence. Pretraining data composition may contribute to multilingual safety risk. Language-specific safety auditing should precede deployment in non-dominant-language healthcare settings; the open-source detection toolkit enables this without model retraining.
Golshani, P.; Joseph, M. S.
Show abstract
The US Food and Drug Administration (FDA) maintains a public list of artificial intelligence and machine learning (AI/ML)-enabled medical devices that have received marketing authorization. Prior published analyses examined this list at earlier time points and reported a marked dominance of radiology applications. We performed a cross-sectional analysis of all 1,430 AI/ML-enabled medical device authorizations recorded by the FDA between September 1995 and December 2025 to characterize the cumulative growth, specialty distribution, and manufacturer concentration of authorized devices. The annual authorization volume increased from a mean of 1.8 per year between 1995 and 2014 to 264 per year between 2023 and 2025, with 331 authorizations recorded in 2025 alone. Devices reviewed by the FDAs Radiology panel accounted for 1,094 of 1,430 authorizations (76.5%), and the three most represented panels (Radiology, Cardiovascular, and Neurology) accounted for 90.6% of all authorizations. Several large clinical specialties were represented by very small numbers of authorized devices, including Pathology (n = 9), Microbiology (n = 6), and Obstetrics and Gynecology (n = 4). No authorizations were recorded under a psychiatry or behavioral health review panel. Of 740 unique companies, 502 (67.8%) had a single authorized device, while 13 companies (1.8%) accounted for 217 devices (15.2%). The cumulative regulatory record demonstrates rapid growth that has been concentrated in image-rich diagnostic specialties, with limited representation across many specialties that account for substantial clinical activity in the United States. These findings may inform policy discussions about where regulatory, infrastructure, and dataset investments are most needed to broaden the clinical scope of medical AI.
Roy, J.; Korleski, J. B.; Augustin, R. C.; Yefet, L.; Jensen, Z. D.; Ehman, E. C.; Zadeh, G.; Conners, A. L.; Tevaarwerk, A. J.; Korfiatis, P.
Show abstract
Background: Preparing tumor board patient summaries is time intensive. Large-language-model based systems may automate summarization but require real-world evaluation prior to clinical use. We performed an exploratory retrospective evaluation of the Microsoft Healthcare Agent Orchestrator (HAO), deployed in a Mayo Clinic controlled staged environment, to generate tumor board-style patient summaries from retrospective Electronic Health Record (EHR) notes. Methods: HAO generated summaries for breast, hepatobiliary, and neuro-oncology tumor board cases using up to the most recent 1,000 clinical notes. Clinician reviewers evaluated outputs via REDCap surveys across perceived factuality, completeness, clarity/conciseness, temporal cohesion, comparative performance, safety, and clinical utility (0-4 Likert scale). Reviewers were permitted to query the HAO chat interface to address missing details. Automated factuality was assessed using TBFact (bidirectional entailment), reporting precision and recall against available reference summaries. Results: Among 57 survey responses from 5 different physicians, mean scores exceeded 2.8 across domains, with medians of 3 for most axes. In an exploratory comparison, oncology fellows required less time to review HAO-generated summaries than to manually generate patient summaries (mean difference 13.57 minutes per patient, p<0.001), although this difference may be influenced by prior familiarity with the same cases; 96% of survey responses indicated that HAO would save time. TBFact evaluations showed higher recall than precision across domains, consistent with broad capture of reference content alongside additional content that was not present in gold-standard summaries. Attribution was viewed favorably but showed issues with primary-source specificity and link reliability. Conclusions: In a controlled Mayo environment, HAO demonstrated moderate performance and was associated with reduced review time for tumor board preparation. These findings are promising but preliminary and do not establish clinical safety, noninferiority to manual review, or readiness for routine clinical use. Limitations, including verbosity, specialty-specific content gaps, and inconsistent attribution, highlight the need for iterative refinement and further evaluation.
Hussein, G.; AlShammri, M.; Aldosari, M.; Alshehri, R.; Almasari, G.; Alabdulrahman, R.; Alarfaj, R.; Alrashed, A.; Al-Walah, M. A.
Show abstract
The integration of artificial intelligence (AI) in cardiology requires healthcare worker acceptance for successful implementation. Understanding attitudes and educational needs is crucial for developing effective training programs. A cross-sectional survey was conducted among 408 healthcare workers treating cardiac diseases in Riyadh, Saudi Arabia. We assessed AI acceptance, knowledge levels, and training preferences using validated scales. Statistical analyses included descriptive statistics, chi-square tests, correlation analysis, reliability testing, and multiple logistic regression. Of 408 participants, 407 provided complete responses. The sample comprised predominantly young (87.0% aged [≤]30), female (75.7%) medical residents (89.9%) with limited AI experience (86.7% never used AI clinically). Internal consistency was excellent (Cronbachs = 0.892). Moderate acceptance was observed: 49.9% were aware of AI applications in cardiology, 46.7% were willing to learn, and 42.8% were willing to use AI clinically. However, 49.1% acknowledged lacking sufficient AI knowledge. Logistic regression identified willingness to learn (OR = 3.24, 95% CI: 2.15-4.89) and training interest (OR = 2.87, 95% CI: 1.94-4.25) as the strongest predictors of AI acceptance. The model explained 68.4% of variance (Nagelkerke R{superscript 2} = 0.684) with an AUC of 0.847. Medical residents demonstrate moderate AI acceptance but significant knowledge gaps. Educational interventions--particularly hands-on learning and institutional training programs--are the strongest drivers of AI readiness, surpassing demographic predictors. Integrating AI literacy systematically into medical curricula is essential for successful AI adoption in cardiovascular care. Author summaryHealthcare workers worldwide are increasingly encountering artificial intelligence (AI) tools in clinical settings, yet their readiness to adopt these technologies--particularly in specialized fields like cardiology--remains poorly understood, especially in rapidly developing healthcare systems. In this study, we surveyed 407 healthcare workers in Riyadh, Saudi Arabia, to understand their current attitudes, knowledge gaps, and learning preferences regarding AI in cardiac diagnosis. Our findings reveal that while most participants hold cautious optimism about AI, nearly half acknowledge lacking the knowledge needed to use it confidently. Crucially, we found that educational factors--specifically willingness to learn and interest in institutional training--were far stronger predictors of AI acceptance than demographic characteristics such as age or gender. This means that AI readiness is not a fixed trait determined by who someone is, but a teachable and trainable capacity. These results carry direct implications for medical educators and policymakers: structured, hands-on AI training integrated throughout medical curricula can meaningfully accelerate adoption of beneficial technologies in cardiovascular care and beyond.
Kuria, T.; Kamau, G.; Makokha, F.; Omondi, P.; Mbugua, G.; David, K.; Mbugua, S.; Gitaka, J.
Show abstract
Introduction: Timely, protocol-adherent clinical decisions are crucial for reducing neonatal mortality in low-resource settings. Translating extensive national guidelines into bedside practice remains challenging. Objective: We developed and evaluated AIFYA, a human-supervised, large language model LLM based clinical decision support system CDSS aligned with Kenya's national newborn care protocols. Methods: This prospective mixed methods early stage evaluation guided by the DECIDE-AI framework embedded AIFYA into routine workflows at two public health facilities Level 5 and Level 4 in Bungoma County Kenya from September 2024 to June 2025. Primary outcomes were adoption measured by cumulative neonatal cases managed training reach assessed by credentialed healthcare workers HCWs and guideline and citation concordance evaluated through blinded review of 118 AI generated recommendations by two neonatologists with adjudication by a third. Secondary outcomes included protocol adherence and triage to decision time. Results: A total of 50 HCWs were trained and 550 neonatal cases were managed over 10 months. Among surveyed HCWs n equals 33, 76 percent were female with mean age 32.1 years. Expert review found 75 percent of recommendations were correct and 15 percent partially correct with strong inter rater reliability weighted Cohen's kappa 0.85 and 95 percent CI 0.79 to 0.91. Citation accuracy was 96 percent. In 40 complex dosing scenarios 75 percent of outputs were rated correct. The median triage to decision time was 23 minutes with interquartile range 18 to 31. Implementation was supported by an offline first architecture and a facility based coaching model sustaining engagement despite staff turnover. Conclusion: A human supervised AI CDSS directly and transparently anchored to national clinical guidelines can be successfully implemented in routine low resource neonatal care settings. The system demonstrated high user adoption and strong expert rated concordance. High citation accuracy builds clinical trust ensuring safety and enabling auditable AI. These findings support progression to controlled multi site trials to evaluate clinical effectiveness. Keywords: Neonatal care Clinical decision support system Large language model Artificial intelligence Human supervised Low resource settings Guideline adherence Digital health Kenya
Appiagyei, J. B.; Otu, R. O.; Henry, M. K.; Casterline, B. W.; Becevic, M.
Show abstract
Teledermatology expands access to dermatologic expertise in rural settings, yet diagnostic uncertainty persists in low-resource primary care. This retrospective study evaluated MedGemma-4B-IT, a compact multimodal vision-language model, as adjunctive clinical decision support for challenging diagnostic cases. We analyzed 77 zero-concordance cases (360 clinical photographs) from a Dermatology Extension for Community Healthcare Outcomes (ECHO) tele-mentoring program (2016-2021). Zero-concordance cases showed no overlap between primary clinician provisional diagnosis and dermatologist-confirmed diagnosis. The model was prompted using dermatologist-style format to generate ranked differential diagnoses. Performance was assessed using strict case-level top-k exact-match accuracy and relaxed matching criteria based on fuzzy string similarity. MedGemma achieved 0.0% strict top-1 accuracy, 1.3% top-3 accuracy, 3.9% top-5 accuracy, and 3.9% top-10 accuracy. Relaxed concept-level matching achieved 28.6% top-1, 63.6% top-5, and 67.5% top-10 accuracy. Image-level accuracy was 44.2% (159/360, 95% CI 39.0-49.5%). The model surfaced the correct diagnosis within differential lists in 45.5% of cases despite no exact top-1 matches, suggesting utility for differential expansion rather than definitive diagnosis. Performance varied across diagnostic categories, with highest accuracy in Other categories (54.5%) and lowest in neoplastic conditions (0.0%). Common errors included confusion between inflammatory and other diagnostic groupings. These findings characterize MedGemma performance on real-world teledermatology cases and inform safe, clinician-in-the-loop integration into teledermatology workflows where specialist oversight remains essential.
Shankar, R.; Xu, Q.
Show abstract
BackgroundAmbient AI scribes are rapidly entering clinical workflows, yet end-user perspectives remain underrepresented in the peer-reviewed literature. Online clinician communities offer an unfiltered window into adoption barriers, perceived benefits, and product-level concerns. ObjectiveTo characterise themes and sentiment in clinician discourse on ambient AI scribes across professional Reddit communities. MethodsWe scraped posts from ten clinically oriented subreddits using twelve AI scribe related queries via the public Reddit JSON API. A two-tier keyword filter retained posts mentioning at least one AI scribe term and one clinical or workflow term. Texts were embedded with all-MiniLM-L6-v2, reduced via UMAP, clustered with HDBSCAN, and labelled using BERTopic with c-TF-IDF keyword extraction. Noise topics matching predefined off-topic patterns (for example, residency match, finance) were removed. Themes were assigned concise labels via Claude Sonnet 4. Sentiment was classified per post using cardiffnlp/twitter-roberta-base-sentiment-latest. ResultsAfter filtering, 176 unique relevant posts from seven active subreddits were retained, with r/FamilyMedicine (n = 64) and r/healthIT (n = 34) dominating. BERTopic produced 12 coherent themes spanning workflow integration, vendor comparison (DAX, Heidi, Freed, Abridge), HIPAA and privacy, mobile and device use, templates and formatting, and research versus clinical use. Overall sentiment was 61.4% neutral, 21.6% positive, and 17.0% negative. The most net-positive theme was DAX/Nuance/AI tools (about 55% positive); the most net-negative were charting fatigue and the freed-AI-scribes discussion thread (about 37 to 40% negative). Engagement (median upvotes and comments) was highest for tool-comparison and pricing themes, indicating salience of practical adoption questions. ConclusionsClinician sentiment toward ambient AI scribes is cautiously favourable but dominated by neutral, problem-solving discourse. Vendor selection, cost, HIPAA compliance, and EHR integration are the most actively debated issues. These insights can inform implementation strategy, vendor benchmarking, and policy guidance for ambient documentation tools.
Namian, S.; DiBiase, R.; Elnazer, S. H.; Evers, C.; Fung, C.; Narula, R.; Rafferty, M.; Salahuddin, A.; Sardana, D. J.; Shea, J.; Sullivan, M.; Forman, R.
Show abstract
Background: High school students may be able to communicate health topics to peers and adults. Yet, few studies have evaluated the role of high school students in community health initiatives, making them an underutilized group for disseminating health information. We pilot tested stroke education across five high schools using varied delivery approaches as a preliminary step toward evaluating youth stroke education to improve community health. Methods: In April-May 2025, five high schools in Connecticut and New York participated in stroke education. The format was designed to fit the needs of each school and included an 8-session classroom curriculum (Derby, CT), after-school club meetings (New Haven, CT; Long Island, NY), and one large assembly (Bridgeport, CT). Developed by teachers and neurology providers, the curriculum covered stroke risk factors, symptoms, and emergency response. Students completed a 15-point assessment adapted from the validated Stroke Action Test before, immediately after, and 4-6 weeks post-intervention; data were collected between April and July 2025. Results: Of 112 students completing the pre-test, 99 (88%) completed the immediate post-test and 51 (46%) the delayed follow-up. Average scores rose from 47% pre-intervention to 75% post and 70% at 4-6 weeks. All schools scored <50% on pre-tests suggesting poor baseline stroke knowledge. Conclusion: This pilot suggests that stroke education can be delivered to high school students across varied settings and may support knowledge gains up to 6 weeks. Limitations included small sample sizes and missing follow-up data. If validated in larger studies, this adaptable, teacher-supported approach could offer a scalable public health strategy for improving community stroke preparedness.